This file documents all of the programs that were used to generate the results presented in

Brevoort, Kenneth P., 2011, "Credit Card Redlining Revisited," The Review of Economics and Statistics.  93(2):  714-724.

Included files:

RES_CreateNewTables2.do
CreateNewTables2_output.log
RES_Summarystats.sas
RES_NewCompileData.sas
RES_NewCreateCensusVariables.py
RES_NewCreateCensusVariables_my.py
RES_FindWithin1Mile_new.py
RES_FindWithin1Mile_newBG.py
RES_CreateNewFigure1.m
RES_PlotScoreDifference.m
RES_DoBootstraps.m

Program files that have been prefaced with "RES_" have been altered to remove references to the directories where the data and programs were stored on the Federal Reserve Board's computer network.  Instead, the programs are now set up to be run from a single working directory.  I have also annotated these files where I thought doing so was helpful.

A Note on Data:
--------------

The main data for this project is a sample of credit records drawn from an anonymous credit bureau.  These data are proprietary and, to the best of my knowledge, are not accessible even for a fee.  The description of this that I provided to the editors states:

"It is my understanding that the exact sample used in my paper (which is the same sample used in Dr. Cohen-Cole's paper, which REStat has already accepted for publication) cannot be applied for.  (In fact, even if the credit bureau that supplied the data was willing to provide the data to a third party, I have been told that the necessary identifying information to draw the exact same sample has been lost.)  However, it would be possible for an individual to purchase a different sample of data for the same time period from the credit bureau, which should produce similar estimates as those provided in my paper."

Since this is the exact same dataset used by Cohen-Cole (2011, same issue of RE Stat), interested parties can also check the documentation of his programs to see if he has alternative ways to access the data.


Dataset Creation - RES_Summarystats.sas:
---------------------------------------

This file contains an anontated version of the code that generated the dataset used in the paper.  The credit record data come in four input datasets, credit103, credit103_2, credit104, and credit104_2.  These are four files that were sent by my research assistant at the time, Sean Wallace, to Ethan Cohen-Cole, who used these data as the basis for the paper I am replicating.  Two files were sent initially, credit103 and credit104, which contained the following variables for the 2003 and 2004 samples respectively:

AGE  -- Individual's age
AT28 -- Total high credit/credit limit
AT34 -- Ratio balance to high credit
AT36 -- Months since most recent delinquency
BC02 -- Number of currently active bankcard trades
BC30 -- Percent of bankcard trades >50% of limit
BC31 -- Percent of bankcard trades >75% of limit
BI28 -- Total bank installment high credit/credit limit
BLOCK -- Census block group (4 digit)
BR28 -- Total bank revolving high credit/credit limit
BR33 -- Total balance of all bank revolving trades
COUNTY -- County FIPS code (3 digit)
DOB -- Individual's date of birth
G051 -- Percent of trades never delinquent
G082 -- Number of trades with currently past due balance
G103 -- Months since in activity
G104 -- Months on file
IN28 -- Total installment high credit/credit limit
IN33 -- Total balance of all installment trades
IN34 -- Ratio of installment balance to high credit
INC  -- Geograhy-based income variable (desginates low-to-moderate income tracts)
LLONGL -- Type of geocodes supplied (B=block, T=tract)
LLONGV -- Longitude and latitude coordinates (character string variable)
MINPCT -- Minority composition of the tract
MSA -- Metropolitan Statistical Area
MT34 -- Ratio of balance to high credit on all mortgage trades
MT56 -- Highest amount ever on mortgage trade delinquency
OCC -- Occupation (largely unpopulated)
PCTIN903 -- Relative income of the tract
PERSON -- Individual code (created by Federal Reserve Board staff)
PF33 -- Total balance on all personal finance trades
PF34 -- Ratio personal finance balance to high credit
RE28 -- Total revolving high credit/credit limit
RE33 -- Total balance of all revolving trades
RE34 -- Ratio revolving balance to high credit
S002 -- birthdate (yymm)
S054 -- Number of different subscribers
S055 -- Number of unique accounts
STATE -- State FIPS code (2 digit)
TRACT -- Census tract code (6 digits)
TR_AM -- Credit score

At a later date, two files with additional information were sent, credit103_2 and credit104_2.  These files had the following variables (for the same sample of individuals as the earlier files) for 2003 and 2004, respectively.

G093 -- Number of derogatory public records
G094 -- Number of public record bankruptcies
G095 -- Months since most recent derog public record
PERSON -- Individual code (Created by Federal Reserve Board Staff)
S059 -- Number of public record and tradeline derogatories
S060 -- Number of trades with high method of payment (i.e., delinquency status)
S061 -- Months since most recent 60+ day rating
S062 -- Months since most recent 90+ day rating
S063 -- Total public record amounts
S064 -- Total collection amount ever owed
S065 -- Number of tax liens
S066 -- Number of disputed trades

This program merges these datasets together into EWORK.both

Dataset Creation (continued) -- RES_NewCreateCensusVariables.py and RES_NewCreateCensusVariables_my.py:
------------------------------------------------------------------------------------------------------

These Python programs rely on the same input dataset of Census data, new_bg_censusdata2.csv.  This file contains block-group level information from the 2000 Census (each observation is a Census block group and the variables are as defined in the Census Bureau's Summary File 3).

The programs also use two other input files, NewBlockGroupMatches.csv and NewBlockGroupMatches_newBG.csv.  These files are created, respectively, by the programs FindWithin1Mile_new.py and FindWithin1Mile_newBG.py (both are Python scripts), which identify the block groups within 1 mile of each observation in the data using the radius approach described in the paper.  Both files use block_geocodes.csv, which is a file containing the internal points (longitude and latitute) for each block group, as provided by the Census Summary File 3.  In addition, FindWithin1Mile_new.py also makes use of a file fordistance.csv, which is created by the SAS program "RE Stat Compile Data I.sas", described above.  Both RES_FindWithin1Mile_new.py and RES_FindWithin1Mile_newBG.py use the GeoPy extension for Python, the latest version of which can be downloaded from http://code.google.com/p/geopy/.

These two programs both generate similar data elements, but RES_NewCreateCensusVariables_my.py generates the Census-data-based measures using the internal points provided by the Census Bureau, rather than the longitude and latitude coordinates that were supplied with the credit bureau data, which have been found to contain systematic errors.  

Dataset Creation (continued) -- RES_NewCompileData.sas:
------------------------------------------------------

RES_NewCompileData.sas takes the Census-based variables that were created by RES_NewCreateCensusVariables.py and RES_NewCreateCensusVariables_my.py and integrates them with EWORK.both, created by RES_Summarystats.sas.  The file then creates the variables used in the paper based upon the Census information.  This file also produces the summary statistics that are produced in table 1.


Table 1:
-------

This table provides summary statistics of the variables used in the paper.  The numbers in this table are produced by the "proc means" commands at the end of RES_Summarystats.sas.  

Table 2:
-------

Table 2 shows a single observation of data and was not generated by a program.

Tables 3 through 6:
------------------

Tables 3 through 6 were generated using the Stata program RES_CreateNewTables2.do.  The log file produced by this program is CreateNewTables2_output.log.


Figure 1:
--------

Figure 1 is composed of two graphs that are produced by RES_CreateNewFigure1.m, a Matlab program.  This figure uses newfigure1_data.txt, which is generated by the Stata program taht generated tables 3 through 6, RE Stat Table Creation.do.  NewFigure1.png is the file that contains the graphic image used for figure 1.

Figure 2:
--------

Figure 2 is produced by the Matlab program RES_PlotScoreDifference.m.  ScoreDifferenceGraphs.png is the output file that contains figure 2.

Figure 3:
--------

Figure 3 is generated by the program RES_CreateNewFigure1.m, which is described above.  NewFigure3.png is the file that is produced that is used as figure 3.

Figure 4:
--------

Figure 4 is generated by the Matlab program RES_DoBootstraps.m.  This program uses as an inpute the dataset, ForBootstraps, that is produced by the Stata program "RE Stat Table Creation.do", after it has been transformed into a Matlab dataset using Stat/Transfer.  NewBootstrapGraph.png is the file that contains the graphic image that is used for figure 4.  
